Training State-of-the-Art Portuguese POS Taggers without Handcrafted Features

نویسندگان

  • Cícero Nogueira dos Santos
  • Bianca Zadrozny
چکیده

Part-of-speech (POS) tagging for morphologically rich languages normally requires the use of handcrafted features that encapsulate clues about the language’s morphology. In this work, we tackle Portuguese POS tagging using a deep neural network that employs a convolutional layer to learn character-level representation of words. We apply the network to three different corpora: the original Mac-Morpho corpus; a revised version of the Mac-Morpho corpus; and the Tycho Brahe corpus. Using the proposed approach, while avoiding the use of any handcrafted feature, we produce state-of-the-art POS taggers for the three corpora: 97.47% accuracy on the Mac-Morpho corpus; 97.31% accuracy on the revised Mac-Morpho corpus; and 97.17% accuracy on the Tycho Brahe corpus. These results represent an error reduction of 12.2%, 23.6% and 15.8%, respectively, on the best previous known result for each corpus.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Boosting Named Entity Recognition with Neural Character Embeddings

Most state-of-the-art named entity recognition (NER) systems rely on handcrafted features and on the output of other NLP tasks such as part-of-speech (POS) tagging and text chunking. In this work we propose a language-independent NER system that uses automatically learned features only. Our approach is based on the CharWNN deep neural network, which uses word-level and character-level represent...

متن کامل

A POS Tagger for Code Mixed Indian Social Media Text - ICON-2016 NLP Tools Contest Entry from Surukam

Building Part-of-Speech (POS) taggers for code-mixed Indian languages is a particularly challenging problem in computational linguistics due to a dearth of accurately annotated training corpora. ICON, as part of its NLP tools contest has organized this challenge as a shared task for the second consecutive year to improve the state-of-the-art. This paper describes the POS tagger built at Surukam...

متن کامل

TagMiner: A Semisupervised Associative POS Tagger Effective for Resource Poor Languages

We present here, TagMiner, a data mining approach for part-of-speech (POS) tagging, an important Natural language processing (NLP) classification task. It is a semi-supervised associative classification method for POS tagging. Existing methods for building POS taggers require extensive domain and linguistic knowledge and resources. Our method uses combination of a small POS tagged corpus and a ...

متن کامل

Choosing a Spanish Part-of-Speech tagger for a lexically sensitive task

In this article, four Part-of-Speech (PoS) taggers for Spanish are compared. The evaluation has been carried out without prior training or tuning of the PoS taggers. To allow for a comparison across PoS taggers, their tagsets have been mapped to the universal PoS tagset (Petrov, Das, and McDonald, 2012). The PoS taggers have also been compared as regards the information they provide and how the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014